An Adaptive Grid-based Method for Clustering Multi- Dimensional Online Data Streams
نویسنده
چکیده
Clustering is an important task in mining the evolving data streams. A lot of data streams are high dimensional in nature. Clustering in the high dimensional data space is a complex problem, which is inherently more complex for data streams. Most data stream clustering methods are not capable of dealing with high dimensional data streams; therefore they sacrifice the accuracy of clusters. In order to solve this problem we proposed an adaptive grid -based clustering method. Our focus is on providing up-to-date arbitrary shaped clusters along with improving the processing time and bounding the amount of the memory u sage. In our method (B+C tree), a structure called “Bcell tree” is used to keep the recent information of a data stream. In order to reduce the complexity of the clustering, a structure called “cluster tree” is proposed to maintain multi dimensional clusters. A Cluster tree yields high quality clusters by keeping the boundaries of clusters in a semi -optimal way. Cluster tree captures the dynamic changes of data streams and adjusts the clusters. Our performance study over a number of real and synthetic data streams demonstrates the scalability of algorithm on the number of dimensions and data without sacrificing the accuracy of identified clusters.
منابع مشابه
Multi-Output Adaptive Neuro-Fuzzy Inference System for Prediction of Dissolved Metal Levels in Acid Rock Drainage: a Case Study
Pyrite oxidation, Acid Rock Drainage (ARD) generation, and associated release and transport of toxic metals are a major environmental concern for the mining industry. Estimation of the metal loading in ARD is a major task in developing an appropriate remediation strategy. In this study, an expert system, the Multi-Output Adaptive Neuro-Fuzzy Inference System (MANFIS), was used for estimation of...
متن کاملOn clustering large number of data streams
Data streams and their applications appear in several fields such as physics, finance, medicine, environmental science, etc. As sensor technology improves, sensor data rates continue to increase. Consequently, analyzing data streams becomes ever more challenging. Fast online response is a must for applications that involve multiple data streams, especially when the number of data streams is lar...
متن کاملMuDi-Stream: A multi density clustering algorithm for evolving data stream
Density-based method has emerged as a worthwhile class for clustering data streams. Recently, a number of density-based algorithms have been developed for clustering data streams. However, existing density-based data stream clustering algorithms are not without problem. There is a dramatic decrease in the quality of clustering when there is a range in density of data. In this paper, a new metho...
متن کاملPrediction of slope stability using adaptive neuro-fuzzy inference system based on clustering methods
Slope stability analysis is an enduring research topic in the engineering and academic sectors. Accurate prediction of the factor of safety (FOS) of slopes, their stability, and their performance is not an easy task. In this work, the adaptive neuro-fuzzy inference system (ANFIS) was utilized to build an estimation model for the prediction of FOS. Three ANFIS models were implemented including g...
متن کاملAdjustable Probability Density Grid-Based Clustering for Uncertain Data Streams
Most existing traditional grid-based clustering algorithms for uncertain data streams that used the fixed meshing method have the disadvantage of low clustering accuracy. In view of above deficiencies, this paper proposes a novel algorithm APDG-CUStream, Adjustable Probability Density Grid-based Clustering for Uncertain Data Streams, which adopts the online component and offline component. In o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012